Overview

Dataset statistics

Number of variables12
Number of observations6362620
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.7 GiB
Average record size in memory279.4 B

Variable types

Numeric7
Categorical5

Alerts

nameOrig has a high cardinality: 6353307 distinct values High cardinality
nameDest has a high cardinality: 2722362 distinct values High cardinality
amount is highly correlated with oldbalanceDest and 2 other fieldsHigh correlation
oldbalanceOrg is highly correlated with newbalanceOrigHigh correlation
newbalanceOrig is highly correlated with oldbalanceOrgHigh correlation
oldbalanceDest is highly correlated with amount and 2 other fieldsHigh correlation
newbalanceDest is highly correlated with amount and 2 other fieldsHigh correlation
amount_converted is highly correlated with amount and 2 other fieldsHigh correlation
amount is highly correlated with amount_convertedHigh correlation
oldbalanceOrg is highly correlated with newbalanceOrigHigh correlation
newbalanceOrig is highly correlated with oldbalanceOrgHigh correlation
oldbalanceDest is highly correlated with newbalanceDestHigh correlation
newbalanceDest is highly correlated with oldbalanceDestHigh correlation
amount_converted is highly correlated with amountHigh correlation
amount is highly correlated with amount_convertedHigh correlation
oldbalanceOrg is highly correlated with newbalanceOrigHigh correlation
newbalanceOrig is highly correlated with oldbalanceOrgHigh correlation
oldbalanceDest is highly correlated with newbalanceDestHigh correlation
newbalanceDest is highly correlated with oldbalanceDestHigh correlation
amount_converted is highly correlated with amountHigh correlation
type is highly correlated with newbalanceOrigHigh correlation
amount is highly correlated with amount_convertedHigh correlation
oldbalanceOrg is highly correlated with newbalanceOrigHigh correlation
newbalanceOrig is highly correlated with type and 1 other fieldsHigh correlation
oldbalanceDest is highly correlated with newbalanceDestHigh correlation
newbalanceDest is highly correlated with oldbalanceDestHigh correlation
amount_converted is highly correlated with amountHigh correlation
amount is highly skewed (γ1 = 30.99394948) Skewed
amount_converted is highly skewed (γ1 = 30.99394948) Skewed
nameOrig is uniformly distributed Uniform
oldbalanceOrg has 2102449 (33.0%) zeros Zeros
newbalanceOrig has 3609566 (56.7%) zeros Zeros
oldbalanceDest has 2704388 (42.5%) zeros Zeros
newbalanceDest has 2439433 (38.3%) zeros Zeros

Reproduction

Analysis started2024-11-29 23:47:33.707697
Analysis finished2024-11-29 23:59:02.052173
Duration11 minutes and 28.34 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

step
Real number (ℝ≥0)

Distinct743
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean243.3972456
Minimum1
Maximum743
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size97.1 MiB
2024-11-29T18:59:02.373315image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile16
Q1156
median239
Q3335
95-th percentile490
Maximum743
Range742
Interquartile range (IQR)179

Descriptive statistics

Standard deviation142.331971
Coefficient of variation (CV)0.5847723161
Kurtosis0.329070555
Mean243.3972456
Median Absolute Deviation (MAD)92
Skewness0.3751768885
Sum1548644183
Variance20258.38998
MonotonicityIncreasing
2024-11-29T18:59:03.259946image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1951352
 
0.8%
1849579
 
0.8%
18749083
 
0.8%
23547491
 
0.7%
30746968
 
0.7%
16346352
 
0.7%
13946054
 
0.7%
40345155
 
0.7%
4345060
 
0.7%
35544787
 
0.7%
Other values (733)5890739
92.6%
ValueCountFrequency (%)
12708
 
< 0.1%
21014
 
< 0.1%
3552
 
< 0.1%
4565
 
< 0.1%
5665
 
< 0.1%
61660
 
< 0.1%
76837
 
0.1%
821097
0.3%
937628
0.6%
1035991
0.6%
ValueCountFrequency (%)
7438
 
< 0.1%
74214
< 0.1%
74122
< 0.1%
7406
 
< 0.1%
73910
< 0.1%
73810
< 0.1%
73710
< 0.1%
73614
< 0.1%
73512
< 0.1%
7348
 
< 0.1%

type
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size439.4 MiB
CASH_OUT
2237500 
PAYMENT
2151495 
CASH_IN
1399284 
TRANSFER
532909 
DEBIT
 
41432

Length

Max length8
Median length7
Mean length7.422395963
Min length5

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPAYMENT
2nd rowPAYMENT
3rd rowTRANSFER
4th rowCASH_OUT
5th rowPAYMENT

Common Values

ValueCountFrequency (%)
CASH_OUT2237500
35.2%
PAYMENT2151495
33.8%
CASH_IN1399284
22.0%
TRANSFER532909
 
8.4%
DEBIT41432
 
0.7%

Length

2024-11-29T18:59:03.438516image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2024-11-29T18:59:03.519721image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
cash_out2237500
35.2%
payment2151495
33.8%
cash_in1399284
22.0%
transfer532909
 
8.4%
debit41432
 
0.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

amount
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct5316900
Distinct (%)83.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean179861.9035
Minimum0
Maximum92445516.64
Zeros16
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size97.1 MiB
2024-11-29T18:59:03.650426image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2224.0995
Q113389.57
median74871.94
Q3208721.4775
95-th percentile518634.1965
Maximum92445516.64
Range92445516.64
Interquartile range (IQR)195331.9075

Descriptive statistics

Standard deviation603858.2315
Coefficient of variation (CV)3.357343715
Kurtosis1797.956705
Mean179861.9035
Median Absolute Deviation (MAD)68393.655
Skewness30.99394948
Sum1.144392945 × 1012
Variance3.646447637 × 1011
MonotonicityNot monotonic
2024-11-29T18:59:03.828963image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100000003207
 
0.1%
1000088
 
< 0.1%
500079
 
< 0.1%
1500068
 
< 0.1%
50065
 
< 0.1%
10000042
 
< 0.1%
2150037
 
< 0.1%
12000029
 
< 0.1%
13500020
 
< 0.1%
016
 
< 0.1%
Other values (5316890)6358969
99.9%
ValueCountFrequency (%)
016
< 0.1%
0.011
 
< 0.1%
0.023
 
< 0.1%
0.032
 
< 0.1%
0.041
 
< 0.1%
0.061
 
< 0.1%
0.071
 
< 0.1%
0.091
 
< 0.1%
0.11
 
< 0.1%
0.112
 
< 0.1%
ValueCountFrequency (%)
92445516.641
< 0.1%
73823490.361
< 0.1%
71172480.421
< 0.1%
69886731.31
< 0.1%
69337316.271
< 0.1%
67500761.291
< 0.1%
66761272.211
< 0.1%
64234448.191
< 0.1%
63847992.581
< 0.1%
63294839.631
< 0.1%

nameOrig
Categorical

HIGH CARDINALITY
UNIFORM

Distinct6353307
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Memory size458.0 MiB
C2098525306
 
3
C1902386530
 
3
C545315117
 
3
C1999539787
 
3
C1784010646
 
3
Other values (6353302)
6362605 

Length

Max length11
Median length11
Mean length10.48232332
Min length5

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6344009 ?
Unique (%)99.7%

Sample

1st rowC1231006815
2nd rowC1666544295
3rd rowC1305486145
4th rowC840083671
5th rowC2048537720

Common Values

ValueCountFrequency (%)
C20985253063
 
< 0.1%
C19023865303
 
< 0.1%
C5453151173
 
< 0.1%
C19995397873
 
< 0.1%
C17840106463
 
< 0.1%
C4002990983
 
< 0.1%
C10653072913
 
< 0.1%
C19762081143
 
< 0.1%
C20513594673
 
< 0.1%
C14629468543
 
< 0.1%
Other values (6353297)6362590
> 99.9%

Length

2024-11-29T18:59:04.425360image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c20985253063
 
< 0.1%
c19762081143
 
< 0.1%
c7244528793
 
< 0.1%
c3637366743
 
< 0.1%
c18325480283
 
< 0.1%
c15305449953
 
< 0.1%
c14629468543
 
< 0.1%
c20513594673
 
< 0.1%
c16777950713
 
< 0.1%
c10653072913
 
< 0.1%
Other values (6353297)6362590
> 99.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

oldbalanceOrg
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct1845844
Distinct (%)29.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean833883.1041
Minimum0
Maximum59585040.37
Zeros2102449
Zeros (%)33.0%
Negative0
Negative (%)0.0%
Memory size97.1 MiB
2024-11-29T18:59:04.613895image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median14208
Q3107315.175
95-th percentile5823702.278
Maximum59585040.37
Range59585040.37
Interquartile range (IQR)107315.175

Descriptive statistics

Standard deviation2888242.673
Coefficient of variation (CV)3.46360618
Kurtosis32.96487854
Mean833883.1041
Median Absolute Deviation (MAD)14208
Skewness5.249136421
Sum5.305681316 × 1012
Variance8.341945738 × 1012
MonotonicityNot monotonic
2024-11-29T18:59:04.768480image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02102449
33.0%
184918
 
< 0.1%
133914
 
< 0.1%
195912
 
< 0.1%
164909
 
< 0.1%
181908
 
< 0.1%
109908
 
< 0.1%
157902
 
< 0.1%
146899
 
< 0.1%
136898
 
< 0.1%
Other values (1845834)4252003
66.8%
ValueCountFrequency (%)
02102449
33.0%
0.051
 
< 0.1%
0.181
 
< 0.1%
0.211
 
< 0.1%
0.441
 
< 0.1%
0.671
 
< 0.1%
1370
 
< 0.1%
1.021
 
< 0.1%
1.371
 
< 0.1%
1.381
 
< 0.1%
ValueCountFrequency (%)
59585040.371
< 0.1%
57316255.051
< 0.1%
50399045.081
< 0.1%
49585040.371
< 0.1%
47316255.051
< 0.1%
45674547.891
< 0.1%
44892193.091
< 0.1%
43818855.31
< 0.1%
43686616.331
< 0.1%
42542664.271
< 0.1%

newbalanceOrig
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct2682586
Distinct (%)42.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean855113.6686
Minimum0
Maximum49585040.37
Zeros3609566
Zeros (%)56.7%
Negative0
Negative (%)0.0%
Memory size97.1 MiB
2024-11-29T18:59:04.939984image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3144258.41
95-th percentile5980262.336
Maximum49585040.37
Range49585040.37
Interquartile range (IQR)144258.41

Descriptive statistics

Standard deviation2924048.503
Coefficient of variation (CV)3.419485164
Kurtosis32.06698456
Mean855113.6686
Median Absolute Deviation (MAD)0
Skewness5.176884001
Sum5.44076333 × 1012
Variance8.550059648 × 1012
MonotonicityNot monotonic
2024-11-29T18:59:05.107536image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03609566
56.7%
3420.224
 
< 0.1%
7717.834
 
< 0.1%
17979.824
 
< 0.1%
9897.824
 
< 0.1%
3024.554
 
< 0.1%
5405.64
 
< 0.1%
7468.594
 
< 0.1%
944.554
 
< 0.1%
7070.14
 
< 0.1%
Other values (2682576)2753018
43.3%
ValueCountFrequency (%)
03609566
56.7%
0.011
 
< 0.1%
0.031
 
< 0.1%
0.051
 
< 0.1%
0.121
 
< 0.1%
0.131
 
< 0.1%
0.181
 
< 0.1%
0.211
 
< 0.1%
0.231
 
< 0.1%
0.31
 
< 0.1%
ValueCountFrequency (%)
49585040.371
< 0.1%
47316255.051
< 0.1%
43686616.331
< 0.1%
43673802.211
< 0.1%
41690842.641
< 0.1%
41432359.461
< 0.1%
40399045.081
< 0.1%
39585040.371
< 0.1%
38946233.021
< 0.1%
38939424.031
< 0.1%

nameDest
Categorical

HIGH CARDINALITY

Distinct2722362
Distinct (%)42.8%
Missing0
Missing (%)0.0%
Memory size458.0 MiB
C1286084959
 
113
C985934102
 
109
C665576141
 
105
C2083562754
 
102
C1590550415
 
101
Other values (2722357)
6362090 

Length

Max length11
Median length11
Mean length10.48175201
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2262704 ?
Unique (%)35.6%

Sample

1st rowM1979787155
2nd rowM2044282225
3rd rowC553264065
4th rowC38997010
5th rowM1230701703

Common Values

ValueCountFrequency (%)
C1286084959113
 
< 0.1%
C985934102109
 
< 0.1%
C665576141105
 
< 0.1%
C2083562754102
 
< 0.1%
C1590550415101
 
< 0.1%
C248609774101
 
< 0.1%
C178955025699
 
< 0.1%
C45111135199
 
< 0.1%
C136076758998
 
< 0.1%
C102371406597
 
< 0.1%
Other values (2722352)6361596
> 99.9%

Length

2024-11-29T18:59:05.427681image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c1286084959113
 
< 0.1%
c985934102109
 
< 0.1%
c665576141105
 
< 0.1%
c2083562754102
 
< 0.1%
c1590550415101
 
< 0.1%
c248609774101
 
< 0.1%
c178955025699
 
< 0.1%
c45111135199
 
< 0.1%
c136076758998
 
< 0.1%
c102371406597
 
< 0.1%
Other values (2722352)6361596
> 99.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

oldbalanceDest
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct3614697
Distinct (%)56.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1100701.667
Minimum0
Maximum356015889.4
Zeros2704388
Zeros (%)42.5%
Negative0
Negative (%)0.0%
Memory size97.1 MiB
2024-11-29T18:59:05.602214image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median132705.665
Q3943036.7075
95-th percentile5147229.713
Maximum356015889.4
Range356015889.4
Interquartile range (IQR)943036.7075

Descriptive statistics

Standard deviation3399180.113
Coefficient of variation (CV)3.088193846
Kurtosis948.6741254
Mean1100701.667
Median Absolute Deviation (MAD)132705.665
Skewness19.92175792
Sum7.003346437 × 1012
Variance1.155442544 × 1013
MonotonicityNot monotonic
2024-11-29T18:59:05.770764image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02704388
42.5%
10000000615
 
< 0.1%
20000000219
 
< 0.1%
3000000086
 
< 0.1%
4000000031
 
< 0.1%
10221
 
< 0.1%
19819
 
< 0.1%
16018
 
< 0.1%
12518
 
< 0.1%
13218
 
< 0.1%
Other values (3614687)3657187
57.5%
ValueCountFrequency (%)
02704388
42.5%
0.011
 
< 0.1%
0.031
 
< 0.1%
0.131
 
< 0.1%
0.331
 
< 0.1%
0.371
 
< 0.1%
0.791
 
< 0.1%
17
 
< 0.1%
1.391
 
< 0.1%
1.641
 
< 0.1%
ValueCountFrequency (%)
356015889.41
< 0.1%
355553416.31
< 0.1%
355381433.61
< 0.1%
355380483.51
< 0.1%
355185537.11
< 0.1%
328194464.91
< 0.1%
327998074.21
< 0.1%
3279630241
< 0.1%
327852121.41
< 0.1%
327827763.41
< 0.1%

newbalanceDest
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct3555499
Distinct (%)55.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1224996.398
Minimum0
Maximum356179278.9
Zeros2439433
Zeros (%)38.3%
Negative0
Negative (%)0.0%
Memory size97.1 MiB
2024-11-29T18:59:05.953275image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median214661.44
Q31111909.25
95-th percentile5515715.903
Maximum356179278.9
Range356179278.9
Interquartile range (IQR)1111909.25

Descriptive statistics

Standard deviation3674128.942
Coefficient of variation (CV)2.999297751
Kurtosis862.1565079
Mean1224996.398
Median Absolute Deviation (MAD)214661.44
Skewness19.35230206
Sum7.794186583 × 1012
Variance1.349922348 × 1013
MonotonicityNot monotonic
2024-11-29T18:59:06.110855image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02439433
38.3%
1000000053
 
< 0.1%
971418.9132
 
< 0.1%
19169204.9329
 
< 0.1%
1254956.0725
 
< 0.1%
16532032.1625
 
< 0.1%
1412484.0922
 
< 0.1%
7364724.8421
 
< 0.1%
1178808.1421
 
< 0.1%
4743010.6721
 
< 0.1%
Other values (3555489)3922938
61.7%
ValueCountFrequency (%)
02439433
38.3%
0.011
 
< 0.1%
0.331
 
< 0.1%
1.391
 
< 0.1%
1.641
 
< 0.1%
1.741
 
< 0.1%
2.151
 
< 0.1%
2.451
 
< 0.1%
2.711
 
< 0.1%
2.761
 
< 0.1%
ValueCountFrequency (%)
356179278.91
< 0.1%
356015889.41
< 0.1%
355553416.32
< 0.1%
355381433.61
< 0.1%
355380483.51
< 0.1%
355185537.11
< 0.1%
328431698.21
< 0.1%
328194464.91
< 0.1%
327998074.21
< 0.1%
3279630241
< 0.1%

isFraud
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size400.5 MiB
0
6354407 
1
 
8213

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
06354407
99.9%
18213
 
0.1%

Length

2024-11-29T18:59:06.260460image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2024-11-29T18:59:06.341275image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
06354407
99.9%
18213
 
0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

isFlaggedFraud
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size400.5 MiB
0
6362604 
1
 
16

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
06362604
> 99.9%
116
 
< 0.1%

Length

2024-11-29T18:59:06.434989image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2024-11-29T18:59:06.524749image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
06362604
> 99.9%
116
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

amount_converted
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct5316900
Distinct (%)83.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean152882.618
Minimum0
Maximum78578689.14
Zeros16
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size97.1 MiB
2024-11-29T18:59:06.658392image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1890.484575
Q111381.1345
median63641.149
Q3177413.2559
95-th percentile440839.067
Maximum78578689.14
Range78578689.14
Interquartile range (IQR)166032.1214

Descriptive statistics

Standard deviation513279.4967
Coefficient of variation (CV)3.357343715
Kurtosis1797.956705
Mean152882.618
Median Absolute Deviation (MAD)58134.60675
Skewness30.99394948
Sum9.72734003 × 1011
Variance2.634558418 × 1011
MonotonicityNot monotonic
2024-11-29T18:59:06.846889image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
85000003207
 
0.1%
850088
 
< 0.1%
425079
 
< 0.1%
1275068
 
< 0.1%
42565
 
< 0.1%
8500042
 
< 0.1%
1827537
 
< 0.1%
10200029
 
< 0.1%
11475020
 
< 0.1%
016
 
< 0.1%
Other values (5316890)6358969
99.9%
ValueCountFrequency (%)
016
< 0.1%
0.00851
 
< 0.1%
0.0173
 
< 0.1%
0.02552
 
< 0.1%
0.0341
 
< 0.1%
0.0511
 
< 0.1%
0.05951
 
< 0.1%
0.07651
 
< 0.1%
0.0851
 
< 0.1%
0.09352
 
< 0.1%
ValueCountFrequency (%)
78578689.141
< 0.1%
62749966.811
< 0.1%
60496608.361
< 0.1%
59403721.61
< 0.1%
58936718.831
< 0.1%
57375647.11
< 0.1%
56747081.381
< 0.1%
54599280.961
< 0.1%
54270793.691
< 0.1%
53800613.691
< 0.1%

Interactions

2024-11-29T18:58:08.423067image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:56:37.366733image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:56:52.050637image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:06.282190image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:20.754803image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:36.621233image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:53.299850image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:58:10.529155image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:56:39.549185image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:56:54.038357image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:08.299849image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:23.124459image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:39.083616image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:55.538866image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:58:12.636522image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:56:41.607767image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:56:55.997555image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:10.390460image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:25.674679image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:41.478638image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:58.049158image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:58:14.820639image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:56:43.708756image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:56:58.016032image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:12.475386image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:27.836910image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:43.745382image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:58:00.346019image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:58:17.218275image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:56:45.805858image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:00.050803image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:14.488960image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:29.909369image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:45.947511image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:58:02.346345image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:58:19.537037image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:56:47.894278image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:02.091350image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:16.529705image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:31.793290image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:48.566499image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:58:04.324063image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:58:21.709231image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:56:50.002645image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:04.154828image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:18.487849image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:34.073115image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:57:50.682885image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2024-11-29T18:58:06.316732image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2024-11-29T18:59:06.985518image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2024-11-29T18:59:07.179996image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2024-11-29T18:59:07.358519image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2024-11-29T18:59:07.549011image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2024-11-29T18:59:07.702638image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2024-11-29T18:58:36.659605image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2024-11-29T18:58:41.057849image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

steptypeamountnameOrigoldbalanceOrgnewbalanceOrignameDestoldbalanceDestnewbalanceDestisFraudisFlaggedFraudamount_converted
01PAYMENT9839.64C1231006815170136.00160296.36M19797871550.00.00008363.6940
11PAYMENT1864.28C166654429521249.0019384.72M20442822250.00.00001584.6380
21TRANSFER181.00C1305486145181.000.00C5532640650.00.0010153.8500
31CASH_OUT181.00C840083671181.000.00C3899701021182.00.0010153.8500
41PAYMENT11668.14C204853772041554.0029885.86M12307017030.00.00009917.9190
51PAYMENT7817.71C9004563853860.0046042.29M5734872740.00.00006645.0535
61PAYMENT7107.77C154988899183195.00176087.23M4080691190.00.00006041.6045
71PAYMENT7861.64C1912850431176087.23168225.59M6333263330.00.00006682.3940
81PAYMENT4024.36C12650129282671.000.00M11769321040.00.00003420.7060
91DEBIT5337.77C71241012441720.0036382.23C19560086041898.040348.79004537.1045

Last rows

steptypeamountnameOrigoldbalanceOrgnewbalanceOrignameDestoldbalanceDestnewbalanceDestisFraudisFlaggedFraudamount_converted
6362610742TRANSFER63416.99C77807100863416.990.0C18125528600.000.00105.390444e+04
6362611742CASH_OUT63416.99C99495068463416.990.0C1662241365276433.18339850.17105.390444e+04
6362612743TRANSFER1258818.82C15313014701258818.820.0C14709985630.000.00101.069996e+06
6362613743CASH_OUT1258818.82C14361187061258818.820.0C1240760502503464.501762283.33101.069996e+06
6362614743TRANSFER339682.13C2013999242339682.130.0C18504239040.000.00102.887298e+05
6362615743CASH_OUT339682.13C786484425339682.130.0C7769192900.00339682.13102.887298e+05
6362616743TRANSFER6311409.28C15290082456311409.280.0C18818418310.000.00105.364698e+06
6362617743CASH_OUT6311409.28C11629223336311409.280.0C136512589068488.846379898.11105.364698e+06
6362618743TRANSFER850002.52C1685995037850002.520.0C20803885130.000.00107.225021e+05
6362619743CASH_OUT850002.52C1280323807850002.520.0C8732211896510099.117360101.63107.225021e+05